Search CORE

37 research outputs found

Approximate Message Passing for Underdetermined Audio Source Separation

Author: Iqbal Turab
Wang Wenwu
Publication venue
Publication date: 01/01/2017
Field of study

Approximate message passing (AMP) algorithms have shown great promise in sparse signal reconstruction due to their low computational requirements and fast convergence to an exact solution. Moreover, they provide a probabilistic framework that is often more intuitive than alternatives such as convex optimisation. In this paper, AMP is used for audio source separation from underdetermined instantaneous mixtures. In the time-frequency domain, it is typical to assume a priori that the sources are sparse, so we solve the corresponding sparse linear inverse problem using AMP. We present a block-based approach that uses AMP to process multiple time-frequency points simultaneously. Two algorithms known as AMP and vector AMP (VAMP) are evaluated in particular. Results show that they are promising in terms of artefact suppression.Comment: Paper accepted for 3rd International Conference on Intelligent Signal Processing (ISP 2017

arXiv.org e-Print Archive

University of Surrey

Surrey Research Insight

DCASE 2018 Challenge Surrey Cross-Task convolutional neural network baseline

Author: Iqbal Turab
Kong Qiuqiang
Plumbley Mark D.
Wang Wenwu
Xu Yong
Publication venue
Publication date: 01/01/2018
Field of study

The Detection and Classification of Acoustic Scenes and Events (DCASE) consists of five audio classification and sound event detection tasks: 1) Acoustic scene classification, 2) General-purpose audio tagging of Freesound, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio classification. In this paper, we create a cross-task baseline system for all five tasks based on a convlutional neural network (CNN): a "CNN Baseline" system. We implemented CNNs with 4 layers and 8 layers originating from AlexNet and VGG from computer vision. We investigated how the performance varies from task to task with the same configuration of neural networks. Experiments show that deeper CNN with 8 layers performs better than CNN with 4 layers on all tasks except Task 1. Using CNN with 8 layers, we achieve an accuracy of 0.680 on Task 1, an accuracy of 0.895 and a mean average precision (MAP) of 0.928 on Task 2, an accuracy of 0.751 and an area under the curve (AUC) of 0.854 on Task 3, a sound event detection F1 score of 20.8% on Task 4, and an F1 score of 87.75% on Task 5. We released the Python source code of the baseline systems under the MIT license for further research.Comment: Accepted by DCASE 2018 Workshop. 4 pages. Source code availabl

arXiv.org e-Print Archive

University of Surrey

Surrey Research Insight

Weakly Labelled AudioSet Tagging with Attention Neural Networks

Author: Iqbal Turab
Kong Qiuqiang
Plumbley Mark D.
Wang Wenwu
Xu Yong
Yu Changsong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/08/2019
Field of study

Audio tagging is the task of predicting the presence or absence of sound classes within an audio clip. Previous work in audio tagging focused on relatively small datasets limited to recognising a small number of sound classes. We investigate audio tagging on AudioSet, which is a dataset consisting of over 2 million audio clips and 527 classes. AudioSet is weakly labelled, in that only the presence or absence of sound classes is known for each clip, while the onset and offset times are unknown. To address the weakly-labelled audio tagging problem, we propose attention neural networks as a way to attend the most salient parts of an audio clip. We bridge the connection between attention neural networks and multiple instance learning (MIL) methods, and propose decision-level and feature-level attention neural networks for audio tagging. We investigate attention neural networks modeled by different functions, depths and widths. Experiments on AudioSet show that the feature-level attention neural network achieves a state-of-the-art mean average precision (mAP) of 0.369, outperforming the best multiple instance learning (MIL) method of 0.317 and Google's deep neural network baseline of 0.314. In addition, we discover that the audio tagging performance on AudioSet embedding features has a weak correlation with the number of training samples and the quality of labels of each sound class.Comment: 13 page

arXiv.org e-Print Archive

University of Surrey

Surrey Research Insight

Learning with Out-of-Distribution Data for Audio Classification

Author: Cao Yin
Iqbal Turab
Kong Qiuqiang
Plumbley Mark D.
Wang Wenwu
Publication venue
Publication date: 24/01/2020
Field of study

In supervised machine learning, the assumption that training data is labelled correctly is not always satisfied. In this paper, we investigate an instance of labelling error for classification tasks in which the dataset is corrupted with out-of-distribution (OOD) instances: data that does not belong to any of the target classes, but is labelled as such. We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning. The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling. The amount of data required for this is shown to be small. Experiments are carried out on the FSDnoisy18k audio dataset, where OOD instances are very prevalent. The proposed method is shown to improve the performance of convolutional neural networks by a significant margin. Comparisons with other noise-robust techniques are similarly encouraging.Comment: Paper accepted for 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020

arXiv.org e-Print Archive

Crossref

University of Surrey

Surrey Research Insight

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

Author: An Fengyan
Cao Yin
Iqbal Turab
Kong Qiuqiang
Plumbley Mark
Wang Wenwu
Publication venue: 'New York University'
Publication date: 01/01/2019
Field of study

Sound event detection (SED) and localization refer to recognizing sound events and estimating their spatial and temporal locations. Using neural networks has become the prevailing method for SED. In the area of sound localization, which is usually performed by estimating the direction of arrival (DOA), learning-based methods have recently been developed. In this paper, it is experimentally shown that the trained SED model is able to contribute to the direction of arrival estimation (DOAE). However, joint training of SED and DOAE degrades the performance of both. Based on these results, a two-stage polyphonic sound event detection and localization method is proposed. The method learns SED first, after which the learned feature layers are transferred for DOAE. It then uses the SED ground truth as a mask to train DOAE. The proposed method is evaluated on the DCASE 2019 Task 3 dataset, which contains different overlapping sound events in different environments. Experimental results show that the proposed method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method.303

arXiv.org e-Print Archive

Crossref

University of Surrey

Surrey Research Insight

New York University Faculty Digital Archive

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

Author: An Fengyan
Cao Yin
Iqbal Turab
Kong Qiuqiang
Plumbley Mark D.
Wang Wenwu
Publication venue
Publication date: 10/02/2021
Field of study

Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously. We study the SELD task from a multi-task learning perspective. Two open problems are addressed in this paper. Firstly, to detect overlapping sound events of the same type but with different DoAs, we propose to use a trackwise output format and solve the accompanying track permutation problem with permutation-invariant training. Multi-head self-attention is further used to separate tracks. Secondly, a previous finding is that, by using hard parameter-sharing, SELD suffers from a performance loss compared with learning the subtasks separately. This is solved by a soft parameter-sharing scheme. We term the proposed method as Event Independent Network V2 (EINV2), which is an improved version of our previously-proposed method and an end-to-end network for SELD. We show that our proposed EINV2 for joint SED and DoA estimation outperforms previous methods by a large margin, and has comparable performance to state-of-the-art ensemble models.Comment: 5 pages, 2021 IEEE International Conference on Acoustics, Speech and Signal Processin

arXiv.org e-Print Archive

University of Surrey

Event-Independent Network for Polyphonic Sound Event Localization and Detection

Author: Cao Yin
Iqbal Turab
Kong Qiuqiang
Plumbley Mark D.
Wang Wenwu
Zhong Yue
Publication venue
Publication date: 30/09/2020
Field of study

Polyphonic sound event localization and detection is not only detecting what sound events are happening but localizing corresponding sound sources. This series of tasks was first introduced in DCASE 2019 Task 3. In 2020, the sound event localization and detection task introduces additional challenges in moving sound sources and overlapping-event cases, which include two events of the same type with two different direction-of-arrival (DoA) angles. In this paper, a novel event-independent network for polyphonic sound event localization and detection is proposed. Unlike the two-stage method we proposed in DCASE 2019 Task 3, this new network is fully end-to-end. Inputs to the network are first-order Ambisonics (FOA) time-domain signals, which are then fed into a 1-D convolutional layer to extract acoustic features. The network is then split into two parallel branches. The first branch is for sound event detection (SED), and the second branch is for DoA estimation. There are three types of predictions from the network, SED predictions, DoA predictions, and event activity detection (EAD) predictions that are used to combine the SED and DoA features for on-set and off-set estimation. All of these predictions have the format of two tracks indicating that there are at most two overlapping events. Within each track, there could be at most one event happening. This architecture introduces a problem of track permutation. To address this problem, a frame-level permutation invariant training method is used. Experimental results show that the proposed method can detect polyphonic sound events and their corresponding DoAs. Its performance on the Task 3 dataset is greatly increased as compared with that of the baseline method.Comment: conferenc

arXiv.org e-Print Archive

University of Surrey

Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events

Author: Abadi Mart'in
Gehring Jonas
Hershey Shawn
Iqbal Turab
Mesaros Annamaria
Mesaros Annamaria
Parekh Sanjeel
Plumbley Mark D.
Simonyan Karen
Virtanen Tuomas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/12/2019
Field of study

We tackle the task of environmental event classification by drawing inspiration from the transformer neural network architecture used in machine translation. We modify this attention-based feedforward structure in such a way that allows the resulting model to use audio as well as video to compute sound event predictions. We perform extensive experiments with these adapted transformers on an audiovisual data set, obtained by appending relevant visual information to an existing large-scale weakly labeled audio collection. The employed multi-label data contains clip-level annotation indicating the presence or absence of 17 classes of environmental sounds, and does not include temporal information. We show that the proposed modified transformers strongly improve upon previously introduced models and in fact achieve state-of-the-art results. We also make a compelling case for devoting more attention to research in multimodal audiovisual classification by proving the usefulness of visual information for the task at hand,namely audio event recognition. In addition, we visualize internal attention patterns of the audiovisual transformers and in doing so demonstrate their potential for performing multimodal synchronization

arXiv.org e-Print Archive

Crossref

Early life child micronutrient status, maternal reasoning, and a nurturing household environment have persistent influences on child cognitive development at age 5 years: Results from MAL-ED

Author: A Catharine Ross
A M Shamsir Ahmed
Aboud
Ajila T George
Alberto M Soares
Aldo A M Lima
Alessandra Di Moura
Alexandre Havt
Algarin
Ali Turab
Amidou Samie
Angel Mendez Acosta
Angel Orbe Vasquez
Angelina Maphula
Angelina Maphula
Anita K M Zaidi
Anup Ramachandran
Anuradha Bose
Anuradha Bose
Archana Mohale
Asad Ali
Aubrey Bauck
Bacharach
Baitun Nahar
Barbara Schaefer
Beena Koshy
Beena Koshy
Benjamin J J McCormick
Benjamin J J McCormick
Bentley
Beusenberg
Biesalski
Binob Shrestha
Black
Black
Bradley
Bruna L L Maciel
Buliga Mujaga Swema
Caldwell
Carl J Mason
Caroline Amour
Caulfield
Cesar Banda Chavez
Christel Hoest
Cloupas Mahopo
Cláudia B Abreu
Crystal L Patil
Dennis R Lang
Didar Alam
Dinesh Hariraju
Dinesh Mondal
Dixner Rengifo Trigoso
Dror
Eliwaza Bayyo
Elizabeth T Rogawski
Emanuel Nyathi
Engle
Eric Houpt
Erling Svensen
Erling Svensen
Eshel
Estomih R Mduma
Fahmida Tofail
Francisco S Mota
Gagandeep Kang
Gaurvika Nayyar
Georgiadis
Georgiadis
Gwenyth Lee
Hilda Costa
Horta
Houpt
Ila F Lima
Imran Ahmed
Iqbal Hossain
J Daniel Carreon
Jacobs
James Platts-Mills
Jayaprakash Muliyil
Jean Gratz
Jessica C Seidman
Jessica C Seidman
Jhanelle Graham
John Pascal
Jones
Josiane Quetz
José Quirino Filho
Julian Torres Flores
Karen H Tountas
Karthikeyan Ramanujam
Kerry Schulze
Kosek
Kosek
Ladaporn Bodhidatta
Ladislaus Blacy
Ladislaus Yarrot
Laura E Caulfield
Laura E Caulfield
Laura E Murray-Kolb
Laura E Murray-Kolb
Laura L Pendergast
Laura Pendergast
Leah Barrett
Liu
Liu
Lozoff
Lu
M Steffi Jennifer
MAL-ED Network Investigators
MAL-ED Network Investigators
Manjeswori Ulak
Margaret N Kosek
Maribel Paredes Olotegui
Mark A Miller
McCormick
Mery Siguas Salas
Michael Gottlieb
Milena Moraes
Mohan Venkata Raghava
Monica McGrath
Muneera Rasheed
Muneera Rasheed
Munirul Islam
Murray-Kolb
Mustafa Mahfuz
Namaste
Nkrumah
Noélia L Lima
Pablo Peñataro Yori
Pascal Bessong
Pedro H Q S Medeiros
Pendergast
Petry
Pinkerton
Platts-Mills
Prado
Prakash Sunder Shrestha
Priyadarshani Karunakaran
Psaki
Pérez-Escamilla
Rahul J Thomas
Rakhi Ramadas
Ram Krishna Chandyo
Ramya Ambikapathi
Rashidul Haque
Raven
Rebecca Blank
Rebecca Dillingham
Rebecca Scharf
Rebecca Scharf
Reeba Roshan
Reeba Roshan
Regisiana Mvungi
Reinaldo B Oriá
Reinaldo Oria
Richard
Richard
Richard L Guerrant
Rita Shrestha
Rita Shrestha
Robert E Black
Robin P Lazarus
Ronfani
Rosa M S Mota
Rosa Rios de Burga
Rosemary Nshama
Ruan
Sajid Soofi
Samuel P Scott
Sanjaya Kumar Shrestha
Santos
Sayma Haque
Shahida Qureshi
Shanmuga Sundaram E
Shiny Kaki
Silvia Rengifo Pinedo
Sophy Raju
Srujan L Sharma
Stacey Knobler
Stephanie A Richard
Stephanie A Richard
Stephanie Psaki
Sudhir Babji
Sushil John
Suzanne Simons
Tahmeed Ahmed
Tor Strand
Tucker
van Buuren
Victora
Vivek Charu
Vivian Wang
Viyada Doan
Walker
Weschler
WHO
WHO
Willett
William A Petri
William Checkley
William K Pan
Zamora
Zeba Rasmussen
Zeba Rasmussen
Zulfiqar A Bhutta
Álvaro M Leite
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

Background: Child cognitive development is influenced by early-life insults and protective factors. To what extent these factors have a long-term legacy on child development and hence fulfillment of cognitive potential is unknown. Objective: The aim of this study was to examine the relation between early-life factors (birth to 2 y) and cognitive development at 5 y. Methods: Observational follow-up visits were made of children at 5 y, previously enrolled in the community-based MAL-ED longitudinal cohort. The burden of enteropathogens, prevalence of illness, complementary diet intake, micronutrient status, and household and maternal factors from birth to 2 y were extensively measured and their relation with the Wechsler Preschool Primary Scales of Intelligence at 5 y was examined through use of linear regression. Results: Cognitive T-scores from 813 of 1198 (68%) children were examined and 5 variables had significant associations in multivariable models: mean child plasma transferrin receptor concentration (β: −1.81, 95% CI: −2.75, −0.86), number of years of maternal education (β: 0.27, 95% CI: 0.08, 0.45), maternal cognitive reasoning score (β: 0.09, 95% CI: 0.03, 0.15), household assets score (β: 0.64, 95% CI: 0.24, 1.04), and HOME child cleanliness factor (β: 0.60, 95% CI: 0.05, 1.15). In multivariable models, the mean rate of enteropathogen detections, burden of illness, and complementary food intakes between birth and 2 y were not significantly related to 5-y cognition. Conclusions: A nurturing home context in terms of a healthy/clean environment and household wealth, provision of adequate micronutrients, maternal education, and cognitive reasoning have a strong and persistent influence on child cognitive development. Efforts addressing aspects of poverty around micronutrient status, nurturing caregiving, and enabling home environments are likely to have lasting positive impacts on child cognitive development.publishedVersio

University of Bergen

Crossref

NORA - Norwegian Open Research Archives